Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond|arXiv(2023)
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
DOI:
https://doi.org/10.48550/arXiv.2308.12966
大規模言語モデル(Large Langage Model; LLM)
視覚-言語モデル(vision-language models; VLMs)
Qwen